02. Q-Learning Algorithm
Q-Learning Algorithm
Summary
Q-Learning is a model-free algorithm, which means it can explore an environment which may not be fully defined. The values for each state-action pair are estimated based on observations of that environment. More specifically Q-Learning is a TD or Temporal Difference learning approach, because state changes are learned with the assumption that they are sequential, or time-based.
The Q-Learning algorithm is represented with an iterative equation that includes a learning rate(\alpha), and a discount factor(\gamma). The learning rate is a value between 0 and 1 and represents the portion of new information that is incorporated into the q-value at each time step. The discount factor is also a value between 0 and 1 and represents the portion of the future rewards that influence the new q-value at each time step.
As the agent explores the environment and acquires experience from different state-action pairs, it converges on a policy of action for any given state it observes.
Quiz 1: Q-Learning Algorithm
SOLUTION:
- The new q-value would just be the old q-value; nothing would be learned.
- The learned value would be ignored.
- The discount rate would have no effect on the new q-value.